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What is linked data 

The term linked data is entering into common vocabulary and, as 
most interests us in this instance, into the specific terminology of 
library and informafion science. The concepf is complex; we can 
summarize if as fhaf sef of besf pracfices required for publishing 
and connecfing sfrucfured dafa on fhe web for use by a machine. 
If is an expression used fo describe a mefhod of exposing, sharing 
and cormecfing dafa via Uniform Resource Idenfifiers (URls) on fhe 
web. Wifh linked dafa, in ofher words, we refer to data published on 
the web in a format readable, interpretable and, most of all, useable 
by machine, whose meaning is explicifly defined by a sfring of 
words and markers. In this way we constitute a linked data network 
(hence linked data) belonging to a domain (which constitutes the 
initial context), coimected in turn to other external data sets (that is, 
those outside of fhe domain), in a confexf of increasingly exfended 
relafionships. Nexf is presenfed fhe Linked Open Dafa cloud (LOD), 
which collecfs fhe open dafa sefs available on fhe web, and fhe 
paradigm of ifs exponenfial growfh occurring in a very brief period 
of fime which demonsfrafes fhe level of inferesf fhat linked data has 
garnered in organizations and institutions of differenf fypes. 
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Figure 1: Diagram of the linked open data cloud (LOD) in 2007. 
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Figure 2: Diagram of the linked open data cloud (LOD) in 2009. 



Figure 3: Diagram of the linked open data cloud (LOD) in 2011. 
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The concept of linked data is closely related to the semantic web, 
although the semantic web cannot be reduced to the mere techni¬ 
cality of linked data, but requires, for its construction, that certain 
important rules be respected whose ultimate goal is the creation of 
a layer of content accessible to automated processes. Linked data 
make explicit the meanings and connections implicitly contained (or 
in some cases, absent) in web resources (data, pages, programs, etc.). 
The two terms - linked data and semantic web - relate to the same 
semantic field and area of application. Linked data is a technology 
used to realize the semantic web. To better understand the concept 
we are aided by the definition that Tim Bemers-Lee, inventor of 
the world wide web (www), provides for semantic web: "A web of 
things in the world, described by data on the web". The concept is 
generic, but it contains important references: the network, the things 
(the objects related), the data (no longer a record but individual 
elements, atoms). This differentiates the traditional web (the h5rper- 
text web) - constituted of documents, HTML objects, coimected via 
unclassified h5q)erlinks - from the web constituted of "real things" 
(existing entities) described via data. A more precise image begins 
to emerge: 

• the hypertextual web or web of documents as a flat, linear, 
representation of objects; the concrete nature of the semantic 
web is in opposition to the abstract nature of the traditional 
web; 

• the semantic web or web of data as a container of things, of 
objects, rather than as a container of representations of objects: 
an idea of concreteness, in the sense that the data relate to 
the resource and participate in its nature, that is, they are an 
integral part of it, as the resource would not be representable 
without this data. 
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The semantic web was not born, therefore, to replace the traditional 
web, but rather to extend its potential, realizing what Tim Berners- 
Lee describes as a world in which "the daily mechanisms of com¬ 
merce, of bureaucracy, and of our everyday lives will be managed 
by machines fhat interact with other machines, leaving to human 
beings the task of providing fhem wifh inspirafion and infuifion" 
(Berners-Lee and Fischeffi). 

The web of dafa is, fherefore, fhe nafural evolution of fhe web of 
documenfs. Lef us fry fo idenfify fhe disfincfive feafures of each of 
fhem, comparing fheir characferisfics: 

• web of documenfs (h5rperfexfual web): 

- analogy wifh a global filesysfem, an expression of ex- 
freme richness buf also particularly monolifhic; 

- flaf description of objecfs and documenfs; documenfs as 
primary objecfs of description; 

- network of relationships between objects made up of rela¬ 
tionships between documents which are neither inherent 
in the objects themselves, nor form part of their structure; 
links between documents; in consequence: 

>!■ semantics of fhe confent and of fhe links between 
documents is empirical, associated with the objects, 
and thus not part of fhe objecf itself, created by a 
human agent; 

* low degree of sfrucfure in fhe objecfs; 

>!■ objecfs represenfed on fhe web designed for human 
consumption, nof machine-inferprefable or reusable. 

The hyperfexfual web is simple in sfrucfure, and has sparse cormec- 
tions between the data. It can be imagined as an enormous notebook. 
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in which information is noted in a linear fashion, that is, with lit¬ 
tle structure and few relationships, and in which documents are 
readable and useable only by humans. 



Figure 4: Representation of the web of documents, 17th International World 
Wide Web Conference W3C Track @ WWW2008, Beijing, China 
23-24 April 2008 - Linked data: principles and state of the art. 


• web of data (semantic web): 

- analogy with a global database conceived as a relational 
database, consisting of individual objects richly related 
to each other, which in turn form larger entities; 

- articulated description of the object, a description which 
itself becomes an object in the web, because it is reusable; 
things (or descriptions of things) as primary objects of 
description; 

- network of relationships between objects inherent in the 
objects themselves; links between things (including docu¬ 
ments); in consequence: 
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* semantics of the content and of the links is explicit, 
expressive; 

* high degree of structure in (the descriptions of) things; 

* entities designed for machines first, human beings 
second. 



Figure 5: Representation of the web of data 17th International World Wide 
Web Conference W3C Track @ WWW2008, Beijing, China 23-24 
April 2008 - Linked data: principles and state of the art. 

The comparison with relational databases is a basic concept in the 
literature on this topic. We can read on the site of the W 3 C: 

"The semantic web and relational databases. The semantic 
web data model is very directly connected with the model of 
relational databases. A relational database consists of tables, 
which consists of rows, or records. Each record consists of a set 
of fields. The record is nothing but the content of its fields, just 
as an RDF node is nothing but the connections: the property 
values. The mapping is very direct 

• a record is an RDF node; 
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• the field (column) name is RDF property Type; and 

• the record field (table cell) is a value." 

A strong point of the semantic web has always been the expres¬ 
sion, on the web, of a large quantity of information in the relational 
database formulated in a machine-processable format. The serial¬ 
ization format RDF - with its syntax XML - is a format suitable 
for expressing the information in relational databases. The analogy 
is appropriate as the central point of linked data is precisely the 
"predicates" that express the types of relationships through which 
ontologies and networks can be represented. 


Dependent classes 




Properties 

' / ,v 


T^ephon^^ 


Last name 


First name 


Age 


Rossi 


Mario 


46 


06-1234567 


Verdi 


Antonia 


50 


06-345678 




06-237890 



Values 


Figure 6: Representation of a relational database. 

The atomization of the structure of information expresses the char¬ 
acteristics of the web of data; one no longer has a monolithic object, 
rather a set of individual data points, minimal particles - atoms - 
that can be reaggregated in different ways and for different purposes; 
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each attribute of the object has a value in itself, and parficipafes in 
ifs nafure, fhrough expressive, self-explanatory, relafionships. The 
enfifies consfifufed by fhe ensemble of afoms are assembled info a 
sef of sfructured data, each individually independent, but able to be 
logically combined with other data to produce new entities. Having 
given the image of fhe notebook fo illusfrafe fhe web of documenfs, 
we can now fake fhe image of fhe mechanism (reminiscenf of Ran- 
ganafhan), in which every elemenf, independenf in ifself, can be 
combined and reused in an infinife variefy of solufions. The web 
of dafa is, fherefore, a global nefwork of sfafemenfs (or senfences) 
connected through qualified and self-expressive links which become 
a collecfion of knowledge, which is readable and undersfandable by 
a machine, only secondarily for a person. 


Linked data: the world of the internet and 
the role of libraries, archives and museums 

Why is fhe world of networked informafion so inferesfed in fhe 
legacy dafa produced by libraries, archives and museums? Why 
are libraries, archives and museums equally inferesfed in linked 
dafa? The inferesf is acfually reciprocal. Libraries have always pro¬ 
duced qualify dafa in highly-sfrucfured bibliographic and aufhorify 
records, according fo shared and widely disseminated rules, a vasf 
quanfify of dafa. The world of libraries and fhe world of the internet 
are both interested in integration into the net; the former fo ensure 
the visibility and usability of ifs dafa, fhe latter to exploit information 
and create increasingly large and significant networks. The quan¬ 
tity and quality of fhe informafion fhaf populafes fhe nef are fwo 
aspecfs which are often inversely proporfional: much informafion 
is of poor qualify. The increase in nefworked informafion (fhrough 
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publication methods that are increasingly widely-known and used, 
such as for example, self-publishing, social networks) is nof, in facf, 
always S5monymous wifh qualify. The exponenfial growfh and use 
of information available on the net does not coincide with increasing 
trustworthiness of fhe records eifher: fheir degree of reliabilify is 
low. Users must select from the sea of information retrieved to arrive 
at a credible record. On which criterion to base the selection? The 
authoritativeness of the source becomes the key factor, the selection 
takes place at the outset, preferring fo selecf a resource on fhe basis 
of the authoritativeness of ifs creator, insfead of lafer on, choosing 
uncritically on the basis of the ranking of the records that appear on 
the page. The quality of fhe source, fhe cerfainfy of fhe provenance 
become, fherefore, crucial elements in the searcher's exploratory 
process. The role of libraries, archives and museums fhus becomes 
relevant, due to their tradition of attention to the quality of the infor¬ 
mation they produce. Libraries, archives, museums assume, thus, 
the role of generators of qualify information for fhe nef. If is for fhis 
reason fhaf fheir dafa are soughf affer. 


Legacy metadata in libraries: still functional? 

The hisfory of library cafalogues demonsfrafes early widespread 
use of mefadafa, understood as information serving as a surrogafe 
for the resource. The evolution of dafa info ever more strucfured 
and defailed records coincided wifh fhe renewed cenfrality of fhe 
cafalogue on which every service of fhe library is based, fhe prolif- 
erafion of formafs of bibliographic resources and fhe cenfral role of 
automation in library systems. The main characteristics of metadata 
are its: 

1 . nature: it is created, formed from fhe resource; 
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2. aim: to describe an object; 

3. use: it must be structured in such a way as to be processable 
(that is, useable) by a machine, a computer. 

Libraries have long had the stable and consistent objective of sharing 
information through metadata, and have always accorded impor¬ 
tance to its quality Are the metadata used up to this point still 
functional? Do they respond to the requirements of current informa¬ 
tion usage? Is it enough to expose on the web the data that libraries 
have produced over the centuries? Is this exposure (for example, 
in MARC formal) comprehensible and useable oufside of a sfricfly 
library context? Does this not risk being a niche exposure, restricted 
to a narrow environment, in a closed and highly professionalized 
domain? 


The catalogue of the future: of the web and 
not only on the web 

We note that the data produced by libraries - the catalogues -, whose 
creation required the development of sfandards, professional com¬ 
petencies and financing, are not on the web, but isolated from fhe 
web. Catalogues are not, in fact, integrated into the web, they are not 
searchable, even though the web is the place in which most users 
work, play, operate and create other information. The question, 
therefore, is: "How fo modify cafalogues and dafa so fhaf fhey can 
be of the web and not only on the web?". It is exactly the philosophy 
that underlies linked data technology that can offer an inferesting 
sfarfing poinf for achieving fhis sfrafegic goal, on pain of deafh for 
catalogues, abandoned by users in favour of ofher information re- 
frieval fools, such as search engines. If is a fundamenfal fransifion: 
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the inevitable adoption of linked data will bring about a new revolu¬ 
tion, even more radical than that of the 1970s, which saw the passage 
from the card catalogue to the automated catalogue and then on 
to the computerized catalogue, a revolution which crowned the 
role that information technology has assumed in the management 
of communication processes and, therefore, as concerns us more 
closely, in the creation of mediafion tools between the bibliographic 
universe and the user. On the record, the report of the Library of 
Congress Working Group on the Future of Bibliographic Control, 
gives sound guidance in achieving this goal; the change implies: 

1. the transformation of textual description into a set of dafa 
usable for automatic processing by machines; 

2. the need to render data elements uniquely identifiable wifhin 
fhe information context of the web; 

3. the need for data to be compatible with the technologies and 
standards of the web; 

4. the need, in short, to use a language that is in reality interoper¬ 
able across the web. 

The concept of unique identification of objects is of particular in¬ 
terest: the object identified, characterized as being the same thing 
regardless of its textual expression (having, thus, the same mean¬ 
ing) should have a unique identifier, so as fo be useable in diverse 
contexts (libraries, publishers, booksellers, distributors, producers 
of online biographies ...), as well as through the use of differenf 
texfual values. 

Tim Berners-Lee idenfified four rules for the creation of linked 
data on the web: 

1. use URIs (Uniform Resource Identifiers) to identify things 
(objects): URI is a system of global identification, thus valid for 
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all resources contained on the entire web. URI is a keystone 
of web architecture, inasmuch as it constitutes a mechanism 
of resource idenfificafion common fo fhe whole web. Each 
resource on fhe web (a sife, a page wifhin a sife, a documenf, 
any objecf) musf be idenfified by a URI fo be found by ofher 
sysfems, used, linked, efc.; 

2. use HTTP URIs so fhat these things can be looked up by people 
and user agents (browsers, software ...): the schema used to 
construct a URI is declared in the URI itself prior fo fhe colon 
(:); for example, hftp://weafher.example.com/. HTTP uses 
HyperTexf Transfer Protocol as ifs profocol, which is precisely 
fhe schema prescribed for fhe semanfic web; 

3. when someone looks up a URI, provide useful informafion, 
using the standards (RDF, SPARQL (a query language devised 
for linked dafa)): if is necessary fo define fhe confexf and fhe 
characferisfics of fhe resources, fhrough fhe affribufion of fhe 
resource itself to a class, the identification of ifs properties and 
the assignment of values; 

4. include links fo ofher URIs, so fhaf fhey can discover more 
things: the more the data are linked, the more they can be used 
for enrichmenf and fhe deducfion of informafion. 


Linked data: RDF (Resource Description 
Framework) 

Producing linked dafa means, fherefore, expressing fhe meaning of 
informafion, making if shareable among differenf applicafions and 
useable by applicafions ofher fhan fhose for which if was originally 
created. The dafa model used fo sfrucfure linked dafa is RDF, a 
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flexible standard proposed by the W3C to characterize semantically 
both resources and the relationships which hold between them. We 
have defined the reality of the web as a global network of statements 
(or sentences) linked via qualified links. The RDF model codifies the 
data in the form of sfafemenfs comprised of: 

1. subjecf: the portion of fhe senfence fhaf idenfifies fhe fhing 
fhaf is described; 

2. predicafe: fhe properfy of fhe fhing specified by fhe senfence; 

3. objecf: fhe value of fhe properfy of fhe fhing (fhe RDF friple). 
Examples: 

Alberto Moravia is the author of La noia 

Bompiani published II nome della rosa 

Alberto Moravia is the pseudonym of Alberto Pincherle 

Each element of the triple, Tim Berners-Lee reminds us, can, or 
rather, must, technically, be represented via URI. The more URI are 
used the more the information is reusable; this is not required and 
elements of fhe friple can be expressed even in fexfual formal. The 
stafements, or triples, are expressed in RDE in the form of graphs 
(nodes and arcs) which represent the resources, their properties and 
their respective values. 

The triples are encoded via an XML-based syntax (RDE/XML) to 
make them readable, interpretable and understandable by machine, 
which can be the one for which fhe data was created (the native 
system) or a system other than (external to) the one for which it was 
originated. This is the most important characteristic, which opens 
the data to the global information community. 

Let us observe the following asserfions: 
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http://masterABC.unifi/examples/1234 


is the author of 


http://masterABC.unifi/examples/isAuthorOf 


La noia 


hnp://masterABC.unifi.examples/opere/5678 

or 

hnp://master A BC.unifi.examples/opere/La noia 


Figure 7: Representation of a triple (nodes and arcs) in RDF. 



Figure 8: Representation of a network of assertions or triples. 
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Alberto Moi 



<nlLOmsi:mtmn csltgpsfut = 

‘‘http://masterABC.unifi/examples/1234”> 

^ the 3Ait!w Sf <name xwJLlPD.S^”^n”>Alberto Moravia</name> 

<lmM>OLOf 

I . f:plttrPP9MLPP=“bttpy/masterABC.anffi/examples/opere/5678"/> 

^ ''-'k.Q9MsnMSim> 



<aiLQpssilM(Mi cPtwPJiut = 
‘‘Mtp://masterABC.unifi/examples/opere/S87B”> 
<type xm(ileas="en">Book</type> 

<type xmUmp="itM'’>UPr9<7type> 

<title>La nqig</title> 

<Li:0:BsMsmM9r)> 

Figure 9: Representation of a triple in RDF/XML. 


Marco is the son of Gianni 
Susanna is the daughter of Gianni 
Gianni is the son of Chiara 

From these simple assertions it is possible to recover at least three 
others, even though not made explicit with triples: 

Marco and Gianni are male 
Susanna is female 

Chiara is the grandmother of Marco and Susanna 

and we could deduce even more, for example: 

Marco and Susanna are grandchildren of Chiara 
Marco is the brother of Susanna 
Susanna is the sister of Marco 

This mechanism, termed inference - the process through which, 
from a proposition accepted as true, one can pass to a second propo¬ 
sition whose truth-value is inferred from fhe confenf of fhe firsf - is 
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the principle governing the engines that are behind the semantic 
web, which infer knowledge via paths. Each new statement, ex¬ 
pressed in the form of triples and, therefore, in graphs, becomes 
in turn the generator of new Information; the more the spheres of 
belonging of these statements (data sets) grow and intersect, the 
more the semantic network present and available on the web is 
enriched and becomes categorized information. The mechanism 
of inference is well-known in logic and mathematics (inferential 
calculus) and is widely used in computer applications. It acquires a 
particular flavour when applied to the library world; the mechanism 
explains, in fact, the relationships present in bibliographic data but 
not always evident, and of which we became fully conscious with 
the theoretical systematization accomplished by FRBR: a systemati¬ 
zation of concepts existing in cataloguing tradition, at least from 
Cutter onwards, and made increasingly explicit. 

For this mechanism to work, a technological infrastructure must 
be used in which concepts are identified uniquely and in which 
software agents recognize these objects and realize associations and 
equivalences among them, through reference to ontologies, formal 
representations, shared and explicit to specific domains of knowl¬ 
edge. Ontologies permit the representation of entities through the 
description of their characteristics and the identification of the rela¬ 
tionships holding among them, and thus of the semantics that links 
such entities, used primarily to realize categorizations and deductive 
reasoning. Examples of vocabularies and ontologies widely-known 
in the library world are: 

FOAF (Friend Of A Friend) an ontology used to describe persons, 
their activities, their relationships with other persons or things, 
very useful in structuring authority files in linked data; 

SKOS (Simple Knowledge Organization System) a family of for¬ 
mal languages created to represent thesauri, classification 
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schemes, taxonomies, subject headings systems and every 
type of controlled vocabulary. 

IFLA is concentrating on publishing its own standards in RDF with 
the creation of vocabularies and ontologies for FRBR, FRAD, FRSAD 
and ISBD, published in the Open Metadata Registry (previously the 
NSDL Registry), a space created by the W3C to support developers 
and users of confrolled vocabularies, hosfing ontologies from dif- 
ferenf fields, among which are fhe vocabularies for RDA (Resource 
Description and Access), the new cataloguing standard that replaces 
AACR2 (Anglo-American Cataloguing Rules, 2nd edition) created 
by the Anglo-American library community, expanded with refer¬ 
ence to the European context (France in particular) and offered to 
the international bibliographic and library community 
Ontologies are necessary, therefore, fo create and publish a dafasef, 
which expresses a domain of belonging representing a kind of collec¬ 
tion of resources (or graphs), having some characteristic in common, 
and identified via dereferenceable URL Examples of dafasefs avail¬ 
able on fhe web are: 

Dbpedia dafasef confaining dafa exfracfed from Wikipedia; 
LinkedMDB dafasef on fhe world of cinema; 

VIAF Virfual Infernafional Aufhorify File. 

Let us try to elaborate possible inferences combining data present in 
these datasets: 


Eduardo De Filippo was alive between 1900 and 1984 (from VIAF) 

Eduardo De Filippo is the author of Filumena Marturano (from VIAF) 
Eduardo De Filippo was born in Naples (from Dbpedia) 

Naples is the capital of the Region of Campania (from Dbpedia) 

Questi fantasmi is a film directed by Eduardo De Eilippo (from linked MDB) 
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Massimo Troisi is the director of Ricomincio da tre (from Dbpedia) 

Massimo Troisi was born in Naples (from Dbpedia) 

Ricomincio da tre is a film from 1981 (from linked MDB) 

Scusate il ritardo is a film directed by Massimo Troisi (from linked MDB) 

If we wanted to create a dataset relating to celebrities from Campania 
who have distinguished themselves in literature and cinema we 
could use the triples above, extracted from various data sets, to 
feed into our set and infer in this way new information: Eduardo 
De Filippo and Massimo Troisi are 20th century celebrities from 
Campania, literary authors and filmmakers. 


Open Linked Data Project 

How accessible are these datasets, and what are the ways to make 
them truly usable for the wider community? Each institution could 
produce its own linked data, as defined by the criteria and rules 
mentioned above, but not make them open for use on the web. For a 
dataset to be open (and therefore not subject to commercial licenses 
or use restrictions) it must be published as defined by the Open 
Linked Data Project, which provides for the conversion of existing 
datasets or the production of new ones, according to linked data 
principles, but with open licenses. The project, kicked off initially 
with the participation of small organizations, and researchers and 
developers in universities, has, over time, gained numerous adher¬ 
ents among larger, more authoritative organizations and institutions, 
among them the BBC, Thomson Reuters and the Library of Congress. 
This level of adherence and dissemination among respected, recog¬ 
nized and prevalent circles has resulted in the remarkable growth 
and expansion of the project, facilitated by its open nature: anyone 
can participate by publishing a set of data that respects the princi- 
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pies of linked data and creating cross-links (interlinking) with other 
existing datasets. 


Library Linked Data Project 

The W3C Library Linked Data Incubator Group was founded to 
support and favour the development and growth of the interoper¬ 
ability of library archival and museum data on the web. It followed 
the principles of linked data and the semantic web, and the group's 
work was carried out in strict collaboration with the actors in these 
areas. Interesting use cases for the writing of the Final report^ of 
the Incubator Group were provided by the projects supported by 
organizations, small, medium, or the large national libraries. The 
Final report began with the analysis of ongoing projects and defined 
an overall picture, it can be summarized as follows: 

• analysis of the benefits possible from the application of the 
principles of linked data in the library sector; 

• discussion of open issues with particular reference to tradi¬ 
tional data; 

• analysis and enumeration of linked data projects and initia¬ 
tives in the library sector; 

• discussion of issues relating to legal rights and to publication; 

• making of recommendations for next steps in the process of 
applying the principles of linked data to the sector. 

^Available at: http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/. 
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Life cycle of linked data 

What are the steps that an organization must take to process its own 
data and result in its publication as linked data? A good method¬ 
ological reference is provided by Boris Villazon-Terrazas ("Method¬ 
ological guidelines for publishing linked dafa"), which reproduces 
fhe life cycle for fhe producfion of linked dafa in 7 sfeps: 

1. identification of fhe dafa sources; 

2. generafion of fhe ontology model, wifh fhe adopfion of exisf- 
ing ontologies, expressed in OWL, Web Ontology Language, or 
RDF(S) or with the creation (more complex) of new ontologies; 

3. generation of dafa in RDF format, through various available 
mapping languages, also in relation to the original format 
of the data. In this phase the most delicate operation is the 
creation of URl, as these are the key to aligning heterogeneous 
resources drawn from differenf sources; 

4. publicafion of fhe RDF dafa; 

5. dafa cleaning, fo identify evenfual and possible conversion 
errors and make fhe dafa qualifafively useable; 

6. linking the RDF data with other existing data sets, with the 
identification of datasefs of interest that can become linking 
targets, identifying relationships between individual data, val¬ 
idating the relationships thus identified; 

7. make concrefe fhe use of fhe dafa, fhrough various steps, 
among which the publication of fhe resulting dataset on the 
CKAN Registry (Comprehensive Knowledge Archive Net¬ 
work), a registry for the publication of open data and packages, 
which makes their discovery, sharing and reuse possible. 
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The 5 stars of open linked data 

A dataset obtained with the 7 steps suggested by Boris Villazon- 
Terrazas can then be evaluated via a ratings system defined by Tim 
Berners-Lee to assign a score to sites that expose data on the web, 
termed the 5 stars of open linked dafa: 

☆ make your stuff available on the web (whatever format); 

☆☆ make it available as structured data (e.g. excel instead of image scan of a table); 
☆☆☆ non-proprietary format (e.g. csv instead of excel); 

☆☆☆☆ use URLs to identify things, so that people can point at your stuff; 

☆☆☆☆☆ link your data to other people's data to provide context. 

The assessment of the open linked data produced must be carried 
out considering, therefore, five fundamental aspects: 

1. one's own data being available on the web (in whatever for¬ 
mat); 

2. the material put on the web is available as structured data (for 
example, in excel instead of as a scanned image of a table); 

3. having chosen non proprietary formats (for example, in csv 
instead of excel); 

4. having used URL to identify the objects, so that users can point 
to these objects; 

5. one's own data is linked to data produced by others so as to 
define a context. 

Tim Berners-Lee's indications for the assessment of open linked data 
were followed by a series of recommendations, suggestions and 
ways to establish ever more precise norms and rules for evaluation, 
to arrive at a standard as participatory and shared as possible. 
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Abstract: The paper defines the linked data as a set of best practices that are used 
to publish data on the web using a machine; the technology (or mode of realization) 
of linked data is associated with the concept of the semantic web. It is the area of 
the semantic web, or web of data, as defined by Tim Berners-Lee "A web of things 
in the world, described by data on the web". The paper highlights the continuities 
and differences between semantic web and web traditional, or web documents. 
The analysis of linked data takes place within the world of libraries, archives and 
museums, traditionally committed to high standards for structuring and sharing of 
data. The data, in fact, assume the role of generating quality information for the 
network. The production of linked data requires compliance with rules and the 
use of specific technologies and languages, especially in the case of publication of 
linked data in open mode. The production cycle of linked data may be the track, or a 
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guideline, for institutions that wish to join projects to publish their data. Data quality 
is assessed through a rating system designed by Tim Berners-Lee. 

Keywords: Library linked data; RDF; Semantic web 
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